Karhunen–Loève theorem

In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève) is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. Stochastic processes given by infinite series of this form were considered earlier by Damodar Dharmananda Kosambi.^[1] There exist many such expansions of a stochastic process: if the process is indexed over [a, b], any orthonormal basis of L²([a, b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.

In contrast to a Fourier series where the coefficients are real numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.

In the case of a centered stochastic process {X_t}_{t ∈ [a, b]} (where centered means that the expectations E(X_t) are defined and equal to 0 for all values of the parameter t in [a, b]) satisfying a technical continuity condition, X_t admits a decomposition

$X_t = \sum_{k=1}^\infty Z_k e_k(t)$

where Z_k are pairwise uncorrelated random variables and the functions e_k are continuous real-valued functions on [a, b] that are pairwise orthogonal in L²[a, b]. It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients Z_k are orthogonal in the probability space while the deterministic functions e_k are orthogonal in the time domain. The general case of a process X_t that is not centered can be brought back to the case of a centered process by considering (X_t − E(X_t)) which is a centered process.

Moreover, if the process is Gaussian, then the random variables Z_k are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on [0,1] is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.

The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), Empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.

Formulation

Throughout this article, we will consider a square integrable zero-mean random process X_t defined over a probability space (Ω,F,P) and indexed over a closed interval [a, b], with covariance function K_X(s,t). We thus have:

$\forall t\in [a,b], X_t\in L^2(\Omega,\mathcal{F},\mathrm{P}),$

$\forall t\in [a,b], \mathrm{E}[X_t]=0,$

$\forall t,s \in [a,b], K_X(s,t)=\mathrm{E}[X_s X_t].$

We associate to K_X a linear operator T_{K_X} defined in the following way:

$\begin{array}{rrl} T_{K_X}: L^2([a,b]) &\rightarrow & L^2([a,b])\\ f(t) & \mapsto & \int_{[a,b]} K_X(s,t) f(s) ds \end{array}$
Since T_{K_X} is a linear operator, it makes sense to talk about its eigenvalues λ_k and eigenfunctions e_k, which are found solving the homogeneous Fredholm integral equation of the second kind

$\int_{[a,b]} K_X(s,t) e_k(s)\,ds=\lambda_k e_k(t)$

Statement of the theorem

Theorem. Let X_t be a zero-mean square integrable stochastic process defined over a probability space (Ω,F,P) and indexed over a closed and bounded interval [a, b], with continuous covariance function K_X(s,t).

Then K_X(s,t) is a Mercer kernel and letting e_k be an orthonormal basis of L²([a, b]) formed by the eigenfunctions of T_{K_X} with respective eigenvalues λ_k, X_t admits the following representation

$X_t=\sum_{k=1}^\infty Z_k e_k(t)$

where the convergence is in L², uniform in t and

$Z_k=\int_{[a,b]} X_t e_k(t)\, dt$

Furthermore, the random variables Z_k have zero-mean, are uncorrelated and have variance λ_k

$\mathrm{E}[Z_k]=0,~\forall k\in\mathbb{N} \quad\quad\mbox{and}\quad\quad \mathrm{E}[Z_i Z_j]=\delta_{ij} \lambda_j,~\forall i,j\in \mathbb{N}$

Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C.

Proof

The covariance function K_X satisfies the definition of a Mercer kernel. By Mercer's theorem, there consequently exists a set {λ_k,e_k(t)} of eigenvalues and eigenfunctions of T_{K_X} forming an orthonormal basis of L²([a,b]), and K_X can be expressed as

$K_X(s,t)=\sum_{k=1}^\infty \lambda_k e_k(s) e_k(t)$

The process X_t can be expanded in terms of the eigenfunctions e_k as:

$X_t=\sum_{k=1}^\infty Z_k e_k(t)$
where the coefficients (random variables) Z_k are given by the projection of X_t on the respective eigenfunctions

$Z_k=\int_{[a,b]} X_t e_k(t) \,dt$

We may then derive

$\mathrm{E}[Z_k]=\mathrm{E}\left[\int_{[a,b]} X_t e_k(t) \,dt\right]=\int_{[a,b]} \mathrm{E}[X_t] e_k(t) dt=0$
and:

$\begin{array}[t]{rl} \mathrm{E}[Z_i Z_j]&=\mathrm{E}\left[ \int_{[a,b]}\int_{[a,b]} X_t X_s e_j(t)e_i(s) dt\, ds\right]\\ &=\int_{[a,b]}\int_{[a,b]} \mathrm{E}\left[X_t X_s\right] e_j(t)e_i(s) dt\, ds\\ &=\int_{[a,b]}\int_{[a,b]} K_X(s,t) e_j(t)e_i(s) dt \, ds\\ &=\int_{[a,b]} e_i(s)\left(\int_{[a,b]} K_X(s,t) e_j(t) dt\right) ds\\ &=\lambda_j \int_{[a,b]} e_i(s) e_j(s) ds\\ &=\delta_{ij}\lambda_j \end{array}$
where we have used the fact that the e_k are eigenfunctions of T_{K_X} and are orthonormal.

Let us now show that the convergence is in L²:
let $S_N=\sum_{k=1}^N Z_k e_k(t)$ .

$\begin{align} \mathrm{E}[|X_t-S_N|^2]&=\mathrm{E}[X_t^2]%2B\mathrm{E}[S_N^2]-2\mathrm{E}[X_t S_N]\\ &=K_X(t,t)%2B\mathrm{E}\left[\sum_{k=1}^N \sum_{l=1}^N Z_k Z_l e_k(t)e_l(t) \right] -2\mathrm{E}\left[X_t\sum_{k=1}^N Z_k e_k(t)\right]\\ &=K_X(t,t)%2B\sum_{k=1}^N \lambda_k e_k(t)^2 -2\mathrm{E}\left[\sum_{k=1}^N \int_0^1 X_t X_s e_k(s) e_k(t) ds\right]\\ &=K_X(t,t)-\sum_{k=1}^N \lambda_k e_k(t)^2 \end{align}$
which goes to 0 by Mercer's theorem.

Properties of the Karhunen–Loève transform

Special case: Gaussian distribution

Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:

Theorem. The variables Z_i have a joint Gaussian distribution and are stochastically independent if the original process {X_t}_t is Gaussian.

In the gaussian case, since the variables Z_i are independent, we can say more:

$\lim_{N \rightarrow \infty} \sum_{i=1}^N e_i(t) Z_i(\omega) = X_t(\omega)$

almost surely.

The Karhunen-Loève transform decorrelates the process

This is a consequence of the independence of the Z_k.

The Karhunen-Loève expansion minimizes the total mean square error

In the introduction, we mentioned that the truncated Karhunen-Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.

More specifically, given any orthonormal basis {f_k} of L²([a, b]), we may decompose the process X_t as:

$X_t(\omega)=\sum_{k=1}^\infty A_k(\omega) f_k(t)$

where $A_k(\omega)=\int_{[a,b]} X_t(\omega) f_k(t)\,dt$

and we may approximate X_t by the finite sum $\hat{X}_t(\omega)=\sum_{k=1}^N A_k(\omega) f_k(t)$ for some integer N.

Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).

[Proof]

Consider the error resulting from the truncation at the N-th term in the following orthonormal expansion:

$\epsilon_N(t)=\sum_{k=N%2B1}^\infty A_k(\omega) f_k(t)$

The mean-square error ε_N²(t) can be written as:

$\begin{align} \varepsilon_N^2(t)&=\mathrm{E}\left[\sum_{i=N%2B1}^\infty \sum_{j=N%2B1}^\infty A_i(\omega) A_j(\omega) f_i(t) f_j(t)\right]\\ &=\sum_{i=N%2B1}^\infty \sum_{j=N%2B1}^\infty \mathrm{E}\left[\int_{[a, b]}\int_{[a, b]} X_t X_s f_i(t)f_j(s) ds\, dt\right] f_i(t) f_j(t)\\ &=\sum_{i=N%2B1}^\infty \sum_{j=N%2B1}^\infty f_i(t) f_j(t) \int_{[a, b]}\int_{[a, b]}K_X(s,t) f_i(t)f_j(s) ds\, dt \end{align}$

We then integrate this last equality over [a, b]. The orthonormality of the f_k yields:

$\int_{[a, b]} \varepsilon_N^2(t) dt=\sum_{k=N%2B1}^\infty \int_{[a, b]}\int_{[a, b]} K_X(s,t) f_k(t)f_k(s) ds\, dt$

The problem of minimizing the total mean-square error thus comes down to minimizing the right hand side of this equality subject to the constraint that the f_k be normalized. We hence introduce β_k, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function:

$Er[f_k(t),k\in\{N%2B1,\ldots\}]=\sum_{k=N%2B1}^\infty \int_{[a, b]}\int_{[a, b]} K_X(s,t) f_k(t)f_k(s) ds dt-\beta_k \left(\int_{[a, b]} f_k(t) f_k(t) dt -1\right)$

Differentiating with respect to f_i(t)$ and setting the derivative to 0 yields:

$\frac{\partial Er}{\partial f_i(t)}=\int_{[a, b]} \left(\int_{[a, b]} K_X(s,t) f_i(s) ds -\beta_i f_i(t)\right)dt=0$

which is satisfied in particular when $\int_{[a, b]} K_X(s,t) f_i(s) \,ds =\beta_i f_i(t)$ , in other words when the f_k are chosen to be the eigenfunctions of T_{K_X}, hence resulting in the KL expansion.

Explained variance

An important observation is that since the random coefficients Z_k of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of X_t is simply the sum of the variances of the individual components of the sum:

$\begin{align} \mbox{Var}[X_t]&=\sum_{k=0}^\infty e_k(t)^2 \mbox{Var}[Z_k]=\sum_{k=1}^\infty \lambda_k e_k(t)^2 \end{align}$

Integrating over [a, b] and using the orthonormality of the e_k, we obtain that the total variance of the process is:

$\int_{[a,b]} \mbox{Var}[X_t] dt=\sum_{k=1}^\infty \lambda_k$

In particular, the total variance of the N-truncated approximation is $\sum_{k=1}^N \lambda_k$ . As a result, the N-truncated expansion explains $\sum_{k=1}^N \lambda_k/\sum_{k=1}^\infty \lambda_k$ of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an $N\in\mathbb{N}$ such that $\sum_{k=1}^N \lambda_k/\sum_{k=1}^\infty \lambda_k \geq 0.95$ .

The Karhunen-Loève expansion has the minimum representation entropy property

Principal Component Analysis

Main article: Principal Component Analysis

We have established the Karhunen-Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind $\int_{[a,b]} K_X(s,t) e_k(s)\,ds=\lambda_k e_k(t)$ .

However, when applied to a discrete and finite process $\left(X_n\right)_{n\in\{1,\ldots,N\}}$ , the problem takes a much simpler form and standard algebra can be used to carry out the calculations.

Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.

We henceforth consider a random N-dimensional vector $X=\left(X_1~X_2~\ldots~X_N\right)^T$ . As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.

As in the continuous version, we assume that X is centered, otherwise we can let $X:=X-\mu_X$ (where $\mu_X$ is the mean vector of X) which is centered.

Let us adapt the procedure to the discrete case.

Covariance matrix

Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.

Define Σ, the covariance matrix of X. Σ is an N by N matrix whose elements are given by:

$\Sigma_{ij}=E[X_i X_j],\qquad \forall i,j \in \{1,\ldots,N\}$

Rewriting the above integral equation to suit the discrete case, we observe that it turns into:

$\begin{align} &\sum_{i=1}^N \Sigma_{ij} e_j=\lambda e_i\\ \Leftrightarrow \quad& \Sigma e=\lambda e \end{align}$

where $e=(e_1~e_2~\ldots~e_N)^T$ is an N-dimensional vector.

The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.

Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of $\R^N$ , and we write $\{\lambda_i,\phi_i\}_{i\in\{1,\ldots,N\}}$ this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of λ_i. Let also $\Phi$ be the orthonormal matrix consisting of these eigenvectors:

$\begin{align} \Phi &:=\left(\phi_1~\phi_2~\ldots~\phi_N\right)^T\\ \Phi^T \Phi &=I \end{align}$

Principal Component Transform

It remains to perform the actual KL transformation which we will call Principal Component Transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:

$\begin{align} X &=\sum_{i=1}^N \langle \phi_i,X\rangle \phi_i\\ &=\sum_{i=1}^N \phi_i^T X \phi_i \end{align}$

In a more compact form, the Principal Component Transform of X is defined by:

$\left\{ \begin{array}{rl} Y&=\Phi^T X\\ X&=\Phi Y \end{array} \right.$

The i-th component of Y is $Y_i=\phi_i^T X$ , the projection of X on $\phi_i$ and the inverse transform $X=\Phi Y$ yields the expansion of $X$ on the space spanned by the $\phi_i$ :

$X=\sum_{i=1}^N Y_i \phi_i=\sum_{i=1}^N \langle \phi_i,X\rangle \phi_i$

As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some $K\in\{1,\ldots,N\}$ such that $\frac{\sum_{i=1}^K \lambda_i}{\sum_{i=1}^N \lambda_i}\geq \alpha$ where α is the explained variance threshold we wish to set.

Examples

The Wiener process

There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process W_t with covariance function

$K_{W}(t,s) = \operatorname{Cov}(W_t,W_s) = \min (s,t).$

We restrict the time domain to [a,b]=[0,1] without loss of generality.

The eigenvectors of the covariance kernel are easily determined. These are

$e_k(t) = \sqrt{2} \sin \left(k - \textstyle\frac{1}{2}\right) \pi t$

and the corresponding eigenvalues are

$\lambda_k = \frac{1}{(k -\frac{1}{2})^2 \pi^2}.$

[Proof]

In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:

$\begin{align} \int_{[a,b]} K_W(s,t) e(s)ds&=\lambda e(t)\qquad \forall t, 0\leq t\leq 1\\ \int_0^1\min(s,t) e(s)ds&=\lambda e(t)\qquad \forall t, 0\leq t\leq 1 \\ \int_0^t s e(s) ds %2B t \int_t^1 e(s) ds &= \lambda e(t) \qquad \forall t, 0\leq t\leq 1 \end{align}$

differentiating once with respect to t yields:

$\int_{t}^1 e(s) ds=\lambda e'(t)$

a second differentiation produces the following differential equation:

$-e(t)=\lambda e''(t)$

The general solution of which has the form:

$e(t)=A\sin\left(\frac{t}{\sqrt{\lambda}}\right)%2BB\cos\left(\frac{t}{\sqrt{\lambda}}\right)$

where A and B are two constants to be determined with the boundary conditions. Setting t=0 in the initial integral equation gives e(0)=0 which implies that B=0 and similarly, setting t=1 in the first differentiation yields e' (1)=0, whence:

$\cos\left(\frac{1}{\sqrt{\lambda}}\right)=0$

which in turn implies that eigenvalues of T_{K_X} are:

$\lambda_k=\left(\frac{1}{(k-\frac{1}{2})\pi}\right)^2,\qquad k\geq 1$

The corresponding eigenfunctions are thus of the form:

$e_k(t)=A \sin\left((k-\frac{1}{2})\pi t\right),\qquad k\geq 1$

A is then chosen so as to normalize e_k:

$\int_0^1 e_k^2(t) dt=1\quad \implies\quad A=\sqrt{2}$

This gives the following representation of the Wiener process:

Theorem. There is a sequence {Z_i}_i of independent Gaussian random variables with mean zero and variance 1 such that

$W_t = \sqrt{2} \sum_{k=1}^\infty Z_k \frac{\sin \left(\left(k - \frac{1}{2}\right) \pi t\right)}{ \left(k - \frac{1}{2}\right) \pi}.$

Note that this representation is only valid for $t\in[0,1].$ On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L² norm and uniform in t.

The Brownian bridge

Similarly the Brownian bridge $B_t=W_t-tW_1$ which is a stochastic process with covariance function

$K_B(t,s)=\min(t,s)-ts$

can be represented as the series

$B_t = \sum_{k=1}^\infty Z_k \frac{\sqrt{2} \sin(k \pi t)}{k \pi}$

Applications

Adaptive optics systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A).

Karhunen-Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.

Notes

^ Kosambi, D. D. (1943), "Statistics in Function Space", Journal of the Indian Mathematical Society 7: 76–88, MR 9816 .

References

Stark, Henry; Woods, John W. (1986). Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc. ISBN 0-13-711706-X. http://openlibrary.org/books/OL21138080M/Probability_random_processes_and_estimation_theory_for_engineers.
Ghanem, Roger; Spanos, Pol (1991). Stochastic finite elements: a spectral approach. Springer-Verlag. ISBN 0387974563. http://openlibrary.org/books/OL1865197M/Stochastic_finite_elements.
Guikhman, I.; Skorokhod, A. (1977). Introduction a la Théorie des Processus Aléatoires. Éditions MIR.
Simon, B. (1979). Functional Integration and Quantum Physics. Academic Press.
Karhunen, Kari (1947). "Über lineare Methoden in der Wahrscheinlichkeitsrechnung". Ann. Acad. Sci. Fennicae. Ser. A. I. Math.-Phys. 37: 1–79.
Loève, M. (1978). Probability theory. Vol. II, 4th ed.. Graduate Texts in Mathematics. 46. Springer-Verlag. ISBN 0-387-90262-7.
Dai, G. (1996). "Modal wave-front reconstruction with Zernike polynomials and Karhunen–Loeve functions". JOSA A 13 (6): 1218. Bibcode 1996JOSAA..13.1218D. doi:10.1364/JOSAA.13.001218.
Wu B., Zhu J., Najm F.(2005) "A Non-parametric Approach for Dynamic Range Estimation of Nonlinear Systems". In Proceedings of Design Automation Conference(841-844) 2005
Wu B., Zhu J., Najm F.(2006) "Dynamic Range Estimation". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25 Issue:9 (1618-1636) 2006
Jorgensen, Palle E. T.; Song, Myung-Sin (2007). "Entropy Encoding, Hilbert Space and Karhunen-Loeve Transforms". arXiv:math-ph/0701056. Bibcode 2007JMP....48j3503J.
Mathar, Richard J. (2008). "Karhunen-Loeve basis functions of Kolmogorov turbulence in the sphere". Baltic Astronomy 17 (3/4): 383–398. Bibcode 2008BaltA..17..383M.
Mathar, Richard J. (2009). "Modal decomposition of the von-Karman covariance of atmospheric turbulence in the circular entrance pupil". arXiv:0911.4710 [astro-ph.IM]. Bibcode 2009arXiv0911.4710M.
Mathar, Richard J. (2010). "Karhunen-Loeve basis of Kolmogorov phase screens covering a rectangular stripe". Waves in Random and Complex Media 20 (1): 23–35. Bibcode 2020WRCM...20...23M. doi:10.1080/17455030903369677.

External links

Mathematica KarhunenLoeveDecomposition function.
E161: Computer Image Processing and Analysis notes by Pr. Ruye Wang at Harvey Mudd College [1]